Tensorflow Classifiers

In this article, we demonstrate implementing the Tensorflow Linear classifier model by an example. The details regarding this dataset can be found in the Diagnostic Wisconsin Breast Cancer Database.

Dataset

As can be seen, the number of instances is 569 and the number of attributes is 32. The object of the exercise is to create a classification model that can classify the type of Diagnosis base on the rest of the attributes. However, first, let's plot a count plot for Diagnosis attribute.

Features with high variance

Moreover, high variance for some features can hurt our modeling process. For this reason, we would like to standardize features by removing the mean and scaling to unit variance.

Train and Test sets

StratifiedKFold is a variation of k-fold which returns stratified folds: each set contains approximately the same percentage of samples of each target class as the complete set.

Modeling: Tensorflow Linear Classifier

Here, we use the Tensorflow Linear classifier model.tf.estimator.LinearClassifier.

Input Function

The input function specifies how data is converted to a tf.data.Dataset that feeds the input pipeline in a streaming fashion. Moreover, an input function is a function that returns a tf.data.Dataset object which outputs the following two-element tuple:

Moreover, an estimator model consists of two main parts, feature columns, and a numeric vector. Feature columns provide explanations for the input numeric vector. The following function separates categorical and numerical columns (features)and returns a descriptive list of feature columns.

Estimator using the Default Optimizer

ROC Curves

Confusion Matrix

The confusion matrix allows for visualization of the performance of an algorithm. Note that due to the size of data, here we don't provide a Cross-validation evaluation. In general, this type of evaluation is preferred.

Estimator using the FTRL Optimizer with Regularization

The Follow the Regularized Leader (FTRL) model is an implementation of the FTRL-Proximal online learning algorithm for binomial logistic regression (for details see [6]).

ROC Curves

Confusion Matrix

Estimator using an Optimizer with a Learning Rate Decay

ROC Curves

Confusion Matrix


References

  1. Regression analysis Wikipedia page
  2. Tensorflow tutorials
  3. W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, volume 1905, pages 861-870, San Jose, CA, 1993.
  4. O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), pages 570-577, July-August 1995.
  5. W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) 163-171.
  6. Online machine learning Wikipedia page
  7. Learning rate Wikipedia page